Applications in Plant Sciences
○ Wiley
All preprints, ranked by how well they match Applications in Plant Sciences's content profile, based on 21 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Boughalmi, K.; Santacruz Endara, P. G.; Bennett, L. A.; Ecarnot, M.; Bazan, S.; Bastianelli, D.; Bonnal, L.; Couvreur, T. L. P.
Show abstract
PremiseHerbarium collections offer an unparalleled archive of plant biodiversity, but their use for species identification through spectral data remains constrained by uncertain effects of preservation histories. This study assesses whether barium specimens can reliably predict species based on its leaf reflectance spectrum, despite variations in age, geographic origin, or conservation method under limited sample size conditions. MethodsWe scanned herbarium specimens of different ages and geographic distribution of 14 species of the pantropical Annonaceae. In addition, we used a second dataset of 9 species where some specimens were conserved in alcohol prior to drying and some not. We used five supervised classification models frequently used for high-dimensional data such as spectroscopy. ResultsAll models achieved high accuracy (>80%) when trained on multiple specimens per species. However, when using only one specimen per species, accuracy varied substantially depending on the taxon. DiscussionOur findings demonstrate that herbarium specimens often retain a strong taxonomic signal in their spectra, however, inter-individual variability affects accuracy in some taxa. These findings confirm the usefulness of herbarium spectroscopy as a non-destructive tool for species identification and offer a promising avenue for digitizing historical biodiversity data into high-dimensional trait space.
Moore-Pollard, E. R.; Ellestad, P.; Mandel, J.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWPhylogenomic discordance is pervasive and cannot always be resolved by increasing the amount of sequencing data alone. Biological processes such as polyploidy, hybridization, and incomplete lineage sorting are major contributors to discordance and must be accounted for to avoid misleading evolutionary interpretations. To better understand how these processes influence phylogenetic reconstruction, we conducted a comprehensive phylogenomic study in the complex genus Packera. With over 90 species and varieties, 40% of which exhibit polyploidy, aneuploidy, or other cytological complexities, Packera presents significant challenges for phylogenetic reconstruction. Given these complexities, we assessed different published paralog processing methods on the resulting evolutionary relationships and phylogenetic support of this group. We then applied three of these methods to evaluate their impact on tree topology and our understanding of Packeras evolutionary history by constructing a time-calibrated phylogeny, reconstructing historical biogeography, and testing for ancient reticulation. Phylogenetic outcomes varied based on the paralog processing method used, with no method performing the best over others. Our findings highlight the large impact of orthology inference and paralog processing on phylogenomic analyses, particularly in polyploid-rich groups such as Packera, and we offer guidance on methodological impacts along with practical recommendations. We note that gaining a robust understanding of Packeras evolutionary history requires more than computational approaches alone. While technological advancements have greatly expanded our ability to analyze genomic data, effective phylogenomic research still relies on strong taxon sampling and detailed species knowledge. Without careful attention to the biological context, such as reproductive boundaries, cytological variation, ecological interactions, and historical biogeographic processes, phylogenomic studies risk misinterpreting evolutionary history and processes. By accounting for these factors, we can begin to improve the accuracy of evolutionary reconstructions and gain deeper insights into the complex history of plant diversification.
Canales, N. A.; Gardner, E.; Walker, K.; Gress, T.; Bieker, V.; Martin, M. D.; Nesbitt, M.; Antonelli, A.; Ronsted, N.; Barnes, C.
Show abstract
Over the last few centuries, millions of plant specimens have been collected and stored within herbaria and biocultural collections. They therefore represent a considerable resource for a broad range of scientific uses. However, collections degrade over time, and it is therefore increasingly difficult to characterise their genetic signatures. Here, we genotyped highly degraded Cinchona barks and leaves from herbaria using two separate high-throughput sequencing methods (HtS) and compared their performance. We subsequently genotyped specimens using genome skimming, the most commonly performed high-throughput sequencing (HtS) technique. We additionally used a recently developed capture bait set (Angiosperm353) for a target enrichment approach. Specifically, phylogenomic analyses of modern leaf and historical barks of Cinchona were performed, including 23 historical barks and six fresh leaf specimens. We found that samples degraded over time, which directly reduced the quantity and quality of the data produced by both methodologies (in terms of reads mapped to the references). However, we found that both approaches generated enough data to infer phylogenetic relationships, even between highly degraded specimens that are over 230 years old. However, the target capture kit produced data for target nuclear loci and also chloroplast data, which allowed for phylogenies to be inferred from both genomes, whereas it was only possible to use chloroplast data using genome skimming. We therefore find the Angiosperms353 target capture kit a powerful alternative to genome skimming, which can be used to obtain more information from herbarium specimens, and ultimately additional cultural benefits.
Marx, H. E.; Jorgensen, S. A.; Wisely, E.; Li, Z.; Dlugosch, K. M.; Barker, M. S.
Show abstract
O_LIPremise of the study: Large scale projects such as NEON are collecting ecological data on entire biomes to track and understand plant responses to climate change. NEON provides an opportunity for researchers to launch community transcriptomic projects that ask integrative questions in ecology and evolution. We conducted a pilot study to investigate the challenges of collecting RNA-seq data from phylogenetically diverse NEON plant communities, including species with diploid and polyploid genomes. C_LIO_LIMethods: We used Illumina NextSeq to generate >20 Gb of RNA-seq for each of 24 vascular plant species representing 12 genera and 9 families at the Harvard Forest NEON site. Each species was sampled twice, in July and August 2016. We used Transrate, BUSCO, and GO analyses to assess transcriptome quality and content. C_LIO_LIResults: We obtained nearly 650 Gb of RNA-seq data that assembled into more than 755,000 translated protein sequences across the 24 species. We observed only modest differences in assembly quality scores across a range of k-mer values. On average, transcriptomes contained hits to >70% of loci in the BUSCO database. We found no significant difference in the number of assembled and annotated genes between diploid and polyploid transcriptomes. C_LIO_LIDiscussion: Our resource provides new RNA-seq datasets for 24 species of vascular plants in Harvard Forest. Challenges associated with this type of study included recovery of high quality RNA from diverse species and access to NEON sites for genomic sampling. Overcoming these challenges offers clear opportunities for large scale studies at the intersection of ecology and genomics. C_LI
Moore-Pollard, E. R.; Jones, D. S.; Mandel, J. R.
Show abstract
PremiseThe sunflower family specific probe set, Compositae-1061, has enabled family-wide phylogenomic studies and investigations at lower-taxonomic levels by targeting 1,000+ genes. However, it generally lacks resolution at the genus to species level, especially in groups with complex evolutionary histories including polyploidy and hybridization. MethodsIn this study, we developed a new Hyb-Seq probe set, Compositae-ParaLoss-1272, designed to target orthologous loci in Asteraceae family members. We tested its efficiency across the family by simulating target-enrichment sequencing in silico. Additionally, we tested its effectiveness at lower taxonomic levels in genus Packera which has a complex evolutionary and taxonomic history. We performed Hyb-Seq with Compositae-ParaLoss-1272 for 19 taxa which were previously studied using the Compositae-1061 probe set. Sequences from both probe sets were used to generate phylogenies, compare topologies, and assess node support. ResultsWe report that Compositae-ParaLoss-1272 captured loci across all tested Asteraceae members. Additionally, Compositae-ParaLoss-1272 had less gene tree discordance, recovered considerably fewer paralogous sequences, and retained longer loci than Compositae-1061. DiscussionGiven the complexity of plant evolutionary histories, assigning orthology for phylogenomic analyses will continue to be challenging. However, we anticipate this new probe set will provide improved resolution and utility for studies at lower-taxonomic levels and complex groups in the sunflower family.
Gerelle, W. K.; Jost, M.; Marques, I.; Les, D.; Vallejos, R.; Little, S.; Sinn, B. T.; Sokoloff, D. D.; Macfarlane, T. D.; Iles, W.; Feild, T.; Mathews, S.; Moore, M.; Couvreur, T. L. P.; Sauquet, H.; Wanke, S.; Graham, S. W.
Show abstract
PremisePlastid-based data sets continue to play a major role in our understanding of early flowering-plant relationships, although organellar genomes of major lineages outside the monocots and eudicots remain under-sampled. A tendency of mitochondrial RNA-edit sites to mislead phylogenetic analysis in mixed transcriptomic/genomic data sets needs attention in angiosperm-wide studies, which only rarely consider mitochondrial genomes. MethodsWe compared mitochondrial- vs. plastid-based phylogenomic inferences, examined the effect of removing putative RNA-edit sites from mitochondrial data, and performed combined organellar analysis (plastid plus filtered mitochondrial genomes). We expanded taxon sampling for multiple angiosperm lineages for phylogenomic analysis using both organellar genomes, representing several poorly sampled lineages (in particular Degeneriaceae, Trimeniaceae) with smaller (few-gene) data sets. ResultsPlastid-based inferences recover well-supported relationships that align with and build upon previous studies, and recover well-supported internal relationships for two ANA-grade families (Hydatellaceae, Trimeniaceae) sampled for nearly all species. By contrast, unfiltered mitochondrial inferences of angiosperm phylogeny are generally poorly supported, and recover anomalous relationships compared to plastid-based inferences. However, removing putative mitochondrial RNA-edit sites dramatically reduces inter-organellar conflict and improves overall branch support. ConclusionsWe accounted for phylogenomic discordance between the two organellar genomes regarding overall angiosperm-wide relationships and filled in taxonomic gaps (poorly sampled lineages). Removing RNA edit sites substantially improves congruence in interorganellar inferences by effectively correcting a systematic bias in mitochondrial data. Uncertain relationships persist among five major mesangiosperm lineages in plastid-based inferences, but a clade comprising Chloranthales, Ceratophyllales and eudicots is well supported by filtered mitochondrial data.
Ranjbaran, Y.; Maurin, O.; Canadelli, E.; Morosinotto, T.; Weech, M.-H.; Kersey, P.; Antonelli, A.; Baker, W. J.; Sales, G. J.; Dal Grande, F.
Show abstract
DNA recovered from herbarium specimens represents a vital asset in botanical research, playing a pivotal role in unravelling the evolution, diversity, and ecological dynamics of plants. Despite its importance, challenges such as fragmented DNA and insufficient sequencing yields render molecular data retrieval a high-risk and costly endeavour involving the use of non-replaceable herbarium specimens. Here, we propose a framework based on Artificial Intelligence (AI) to forecast the success of genomic DNA extraction suitable for sequencing from herbarium samples. Our model integrates morphological characteristics and sample colour derived from scanned herbarium images, metadata including sample age and locality, and DNA quantity measurements of samples. We train a deep learning algorithm with ca. 2,000 specimens that have been digitized and sequenced in the framework of the Plant and Fungal Trees of Life (PAFTOL) Project, spanning from year 1832 to the present. As training datasets increase with ongoing digitization and genomic sequencing efforts, our AI predictive model can support researchers in selecting the herbarium samples with the highest likelihood of yielding high-quality genomic DNA from amongst a vast array of globally distributed candidate specimens. Our approach enhances the contribution of herbarium-derived DNA in large-scale studies and facilitates the utilisation of historical collections for a deeper understanding of plant evolution and ecology, with implications for conservation.
Blischak, P. D.; Thompson, C. E.; Waight, E. M.; Kubatko, L. S.; Wolfe, A. D.
Show abstract
Reticulate evolutionary events are hallmarks of plant phylogeny, and are increasingly recognized as common occurrences in other branches of the Tree of Life. However, inferring the evolutionary history of admixed lineages presents a difficult challenge for systematists due to genealogical discordance caused by both incomplete lineage sorting (ILS) and hybridization. Methods that accommodate both of these processes are continuing to be developed, but they often do not scale well to larger numbers of species. An additional complicating factor for many plant species is the occurrence of whole genome duplication (WGD), which can have various outcomes on the genealogical history of haplotypes sampled from the genome. In this study, we sought to investigate patterns of hybridization and WGD in two subsections from the genus Penstemon (Plantaginaceae; subsect. Humiles and Proceri), a speciose group of angiosperms that has rapidly radiated across North America. Species in subsect. Humiles and Proceri occur primarily in the Pacific Northwest of the United States, occupying habitats such as mesic, subalpine meadows, as well as more well-drained substrates at varying elevations. Ploidy levels in the subsections range from diploid to hexaploid, and it is hypothesized that most of the polyploids are hybrids (i.e., allopolyploids). To estimate phylogeny in these groups, we first developed a method for estimating quartet concordance factors (QCFs) from multiple sequences sampled per lineage, allowing us to model all haplotypes from a polyploid. QCFs represent the proportion of gene trees that support a particular species quartet relationship, and are used for species network estimation in the program SNaQ (Solis-Lemus & Ane. 2016. PLoS Genet. 12:e1005896). Using phased haplotypes for nuclear amplicons, we inferred species trees and networks for 38 taxa from P. subsect. Humiles and Proceri. Our phylogenetic analyses recovered two clades comprising a mix of taxa from both subsections, indicating that the current taxonomy for these groups is inconsistent with our estimates of phylogeny. In addition, there was little support for hypotheses regarding the formation of putative allopolyploid lineages. Overall, we found evidence for the effects of both ILS and admixture on the evolutionary history of these species, but were able to evaluate our taxonomic hypotheses despite high levels of gene tree discordance. Our method for estimating QCFs from multiple haplotypes also allowed us to include species of varying ploidy levels in our analyses, which we anticipate will help to facilitate estimation of species networks in other plant groups as well.
Mander, L.; Bauer, M.; Hang, H.; Mio, W.
Show abstract
Leaf shape is a key plant trait that varies enormously. The diversity of leaf shape, and the range of applications for data on this trait, requires frequent methodological developments so that researchers have an up-to-date toolkit with which to quantify leaf shape. We generated a dataset of 468 leaves produced by Ginkgo biloba, and 24 fossil leaves produced by evolutionary relatives of extant Ginkgo. We quantified the shape of each leaf by developing a geometric method based on elastic curves and a topological method based on persistent homology. Our geometric method indicates that shape variation in our modern sample is dominated by leaf size, furrow depth, and the angle of the two lobes at the base of the leaf that is also related to leaf width. Our topological method indicates that shape variation in our modern sample is dominated by leaf size and furrow depth. We have applied both methods to modern and fossil material: the methods are complementary, identifying similar primary patterns of variation, but also revealing some different aspects of morphological variation. Our topological approach distinguishes long-shoot leaves from short-shoot leaves and both methods indicate that leaf shape influences or is at least related to leaf area.
Kling, M. M.; Gonzalez-Ramirez, I. S.; Carter, B.; Borokini, I.; Mishler, B. D.
Show abstract
Spatial phylogenetics is premised on the idea that species are not discrete categorical entities but instead lie on a hierarchical evolutionary continuum that contains rich biological information valuable for quantifying spatial biodiversity patterns. Yet while spatial phylogenetic approaches use quantitative information to represent phylogenetic patterns, most have continued to rely on methods that discard valuable information about spatial patterns by converting continuous variables into binary categories. This includes representing geographic ranges using binary presence-absence data, classifying statistical significance into categories, and quantifying biogeographic gradients into discrete regions. In this paper we show how a full suite of spatial phylogenetic analyses, including analyses of alpha and beta diversity, neo- and paleo-endemism, biogeographic hypothesis testing, and spatial conservation prioritization, can be implemented with "smooth" methods that never remove information content by categorizing continuous data. Our analysis focuses on the bryophytes of California, an understudied group in a global plant biodiversity hotspot. Using a time-calibrated phylogeny and species distribution models for 548 species of mosses and liverworts, we profile the evolutionary diversity, compositional turnover, and conservation value of bryophyte communities across the state. Our results highlight important patterns in the diversity of this key plant group, while our methods can serve as a model for future studies seeking to maximize the information content of spatial phylogenetic analyses.
Knapp, R.; Johnson, B.; Busta, L.
Show abstract
Premise: Recently, plant science has seen transformative advances in scalable data collection for sequence and chemical data. These large datasets, combined with machine learning, revealed that conducting plant metabolic research on large scales yields remarkable insights. A key next step in increasing scale has been revealed with the advent of accessible large language models, which, even in their early stages, can distill structured data from literature. This brings us closer to creating specialized databases that consolidate virtually all published knowledge on a topic. Methods: Here, we first test different prompt engineering technique / language model combinations in the identification of validated enzyme-product pairs. Next, we evaluate automated prompt engineering and retrieval augmented generation applied to identifying compound-species associations. Finally, we build and determine the accuracy of a multimodal language model-based pipeline that transcribes images of tables into machine-readable formats. Results: When tuned for each specific task, these methods perform with high accuracies (80-90 percent for enzyme-product pair identification and table image transcription), or with modest accuracies (50 percent) but lower false-negative rates than previous methods (down to 40 percent from 55 percent) for compound-species pair identification. Discussion: We enumerate several suggestions for working with language models as researchers, among which is the importance of the users domain-specific expertise and knowledge. Significance StatementScientific databases have played a major role in advancing metabolic research. However, even todays advanced databases are incomplete and/or are not built to best suit certain research tasks. Here, we explored and evaluated the use of large language models and various prompt engineering techniques to expand and subset existing databases in task-specific ways. Our results illustrate the potential for high-accuracy additions and restructurings of existing databases using language models, assuming the specific methods by which the models are used are tuned and validated for the specific task. These findings are important because they outline a method by which we could greatly expand existing databases and rapidly tailor them to specific research efforts, leading to greater research productivity and effective utilization of past research findings. All authors collected data, analyzed data, prepared the manuscript, and approved its final version. The authors declare that they have no competing interests.
Domazetoski, V.; Kreft, H.; Bestova, H.; Wieder, P.; Koynov, R.; Zarei, A.; Weigelt, P.
Show abstract
Functional plant ecology aims to understand how functional traits govern the distribution of species along environmental gradients, the assembly of communities, and ecosystem functions and services. The rapid rise of functional plant ecology has been fostered by the mobilization and integration of global trait datasets, but significant knowledge gaps remain about the functional traits of the [~]380,000 vascular plant species worldwide. The acquisition of urgently needed information through field campaigns remains challenging, time-consuming and costly. An alternative and so far largely untapped resource for trait information is represented by texts in books, research articles and on the internet which can be mobilized by modern machine learning techniques. Here, we propose a natural language processing (NLP) pipeline that automatically extracts trait information from an unstructured textual description of a species and provides a confidence score. To achieve this, we employ textual classification models for categorical traits and question answering models for numerical traits. We demonstrate the proposed pipeline on five categorical traits (growth form, life cycle, epiphytism, climbing habit and life form), and three numerical traits (plant height, leaf length, and leaf width). We evaluate the performance of our new NLP pipeline by comparing results obtained using different alternative modeling approaches ranging from a simple keyword search to large language models, on two extensive databases, each containing more than 50,000 species descriptions. The final optimized pipeline utilized a transformer architecture to obtain a mean precision of 90.8% (range 81.6-97%) and a mean recall of 88.6% (77.4-97%) on the categorical traits, which is an average increase of 21.4% in precision and 57.4% in recall compared to a standard approach using regular expressions. The question answering model for numerical traits obtained a normalized mean absolute error of 10.3% averaged across all traits. The NLP pipeline we propose has the potential to facilitate the digitalization and extraction of large amounts of plant functional trait information residing in scattered textual descriptions. Additionally, our study adds to an emerging body of NLP applications in an ecological context, opening up new opportunities for further research at the intersection of these fields.
Wenk, E. H.; Sauquet, H.; Gallagher, R. V.; Brownlee, R.; Boettiger, C.; Coleman, D.; Yang, S.; Auld, T.; Barrett, R. L.; Brodribb, T.; Choat, B.; Dun, L.; Ellsworth, D.; Gosper, C.; Guja, L.; Jordan, G. J.; Breton, T.; Leigh, A.; Irving, P.; Medlyn, B.; Nolan, R.; Ooi, M.; Sommerville, K. D.; Vesk, P.; White, M.; Wright, I. J.; Falster, D. S.
Show abstract
Traits with intuitive names, a clear scope and explicit description are essential for all trait databases. Reanalysis of data from a single database, or analyses that integrate data across multiple databases, can only occur if researchers are confident the trait concepts are consistent within and across sources. The lack of a unified, comprehensive resource for plant trait definitions has previously limited the utility of trait databases. Here we describe the AusTraits Plant Dictionary (APD), which extends the trait definitions included in the new trait database AusTraits. The development process of the APD included three steps: review and formalisation of the scope of each trait and the accompanying trait description; addition of trait meta-data; and publication in both human and machine-readable forms. Trait definitions include keywords, references and links to related trait concepts in other databases, and the traits are grouped into a hierarchy for easy searching. As well as improving the usability of AusTraits, the Dictionary will foster the integration of trait data across global and regional plant trait databases.
Baldwin, E. A.; Rogers, W. L.; Leebens-Mack, J.
Show abstract
Premise of the StudyCarnivory has evolved repeatedly across the plant tree of life despite being a dramatic shift from typical plant nutrient acquisition strategies. It remains largely unclear whether the evolution of carnivory takes a similar genomic trajectory. Here, we explore the genomic consequences of carnivory in the pitcher plant genus Sarracenia. MethodsWe use a combination of Pacbio HiFi long-read sequencing and trio-binning to assemble chromosome-scale genome sequences for S. psittacina and S. rosea. We conduct comparative analyses with other asterid genomes to evaluate patterns of gene family expansion and contraction during the transition to carnivory. ResultsBoth Sarracenia genomes are large ([~]3.5 Gbp) and highly repetitive ([~]87% repeats) yet only contain [~]22,000 genes. This reduced gene content reflects widespread gene family contraction. In total, 3,654 gene families have contracted, including the complete loss of 934 gene families, while only 751 gene families have expanded. The gene losses are enriched for functions related to photosynthesis, including nuclear-encoded subunits of the NADH dehydrogenase (Ndh) complex, as well as immune-related genes. ConclusionsThese results indicate that the evolution of carnivory in Sarracenia is associated with widespread gene loss rather than extensive gene family expansion. The loss of genes involved in photosynthesis and immune response suggest the relaxation of selection on these functions, which may be partially supplanted by prey-derived nutrient acquisition and pitcher-associated microbiome. These chromosome-level assemblies will enable future comparative studies in plant evolution, while also serving as critical resources for the conservation of this ecologically significant lineage.
Hightower, A. T.; Chitwood, D. H.; Josephs, E. B.
Show abstract
O_LIStudies into the evolution and development of leaf shape have connected variation in plant form, function, and fitness. For species with consistent leaf margin features, patterns in leaf architecture are related to both biotic and abiotic factors. However, for species with inconsistent leaf margin features, quantifying leaf shape variation and the effects of environmental factors on leaf shape has proven challenging. C_LIO_LITo investigate leaf shape variation in species with inconsistent shapes, we analyzed approxi-mately 500 digitized Capsella bursa-pastoris specimens collected throughout the continental U.S. over a 100-year period with geometric morphometric modeling and deterministic techniques. We generated a morphospace of C. bursa-pastoris leaf shapes and modeled leaf shape as a function of environment and time. C_LIO_LIOur results suggest C. bursa-pastoris leaf shape variation is strongly associated with temperature over the C. bursa-pastoris growing season, with lobing decreasing as temperature increases. While we expected to see changes in variation over time, our results show that level of leaf shape variation is consistent over the 100-year period. C_LIO_LIOur findings showed that species with inconsistent leaf shape variation can be quantified using geometric morphometric modeling techniques and that temperature is the main environmental factor influencing leaf shape variation. C_LI
Hightower, A. T.; Hall, S.; Camacho, R. U.; Papamichail, A.; Adamski, E.; Colligan, C.; Deneen, A.; Dunn, G.; Haziza, J.; Henley, C.; Pawawongsak, A.; Simms, L.; Ward, S.; Balant, M.; Blackwood, C.; Cannon, C.; Case, A.; Husbands, A.; Josephs, E. H.; Migicovsky, Z.; Naegele, R.; Patterson, E.; Saavedra-Rojas, Y.-A.; Chitwood, D. H.
Show abstract
PremiseWhen examining leaf shapes that are different from one another, it can be difficult to compare both the overall leaf shape and points along the leaf margin in biologically and statistically meaningful ways. MethodTo address this problem, we present a simple and user-friendly leaf shape analysis in Jupyter Notebook and Python that uses pseudo-landmarks and Generalized Procrustes Analysis to measure and compare the shape of any leaf. To demonstrate our analysis, we created a repository of real leaves gathered from eight experimental datasets. ResultsUsing our leaf repository, we explain how we can use pseudo-landmarks to compare all leaf shapes both within and between species using dimension reduction techniques like Principal Component Analysis and can predict leaf shapes using pseudo-landmarks through Linear Discriminant Analysis. Our leaf shape analysis also maps differences in shape as leaves grew around a rosette, showing the transition of shape across development (phyllotaxy). Finally, we showed how we can investigate the relationship between leaf shape variation and genetic diversity by combining shape with genetic data. DiscussionThrough the use of Generalized Procrustes Analysis and pseudo-landmarks, our leaf shape analysis presents a powerful tool for examining the shape of any leaf across multiple biological, ecological, evolutionary, and developmental scales.
Hodge, J. G.; Li, Q.; Doust, A.
Show abstract
Assessing the phenotypes underlying plant growth and development is integral to exploring the development, genetics, and evolution of morphology and plays an essential role in agronomic and basic research studies. Although various automated or semi-automated phenomic approaches have recently been developed, tools assessing differential growth of plant organs remains a key topic of interest, but one which is often difficult to analyze due to the requirements of segmenting and annotating specific structures or positions in the plant body in time-series data. To address this gap, we have developed a generalized workflow linking our previously published function, acute, with a companion function, homology, in the PlantCV environment. The homology function uses a generalized strategy of dimensionality reduction via starscape followed by hierarchical clustering through constella to identify constellations of segments in eigenspace that represent the same landmark in consecutive images of a time-series. We devised a quality control function, constellaQC, that can test the accuracy of the clustering approach, and we use it to show that the approach accurately clustered the pseudo-landmarks derived from acute, although with several sources of error. We discuss the reasons for and consequences of these errors in automated workflows, and suggest how to develop these functions so that they can easily be repurposed for other phenomics datasets that may vary in dimensional complexity.
Taylor, S. D.; Guralnick, R. P.
Show abstract
PremiseResearch on large-scale patterns of phenology have utilized multiple sources of data to analyze the timing of events such as flowering, fruiting, and leaf out. In-situ observations from standardized surveys are ideal, but remain spatially sparse. Herbarium records and phenology-focused citizen science programs provide a source of historic data and spatial replication, but the sample sizes for any one season are still relatively low. A novel and rapidly growing source of broad-scale phenology data are photographs from the iNaturalist platform, but methods utilizing these data must generalize to a range of different species with varying season lengths and occurring across heterogenous areas. They must also be robust to different sample sizes and potential biases toward well travelled areas such as roads and towns.\n\nMethods/ResultsWe developed a spatially explicit model, the Weibull Grid, to estimate flowering onset across large-scales, and utilized a simulation framework to test the approach using different phenology and sampling scenarios. We found that the model is ideal when the underlying phenology is non-linear across space. We then use the Weibull Grid model to estimate flowering onset of two species using iNaturalist photographs, and compare those estimates with independent observations of greenup from the Phenocam network. The Weibull Grid model estimate consistently aligned with Phenocam greenup across four years and broad latitudes.\n\nConclusioniNaturalist observations can considerably increase the amount of phenology observations and also provide needed spatial coverage. We showed here they can accurately describe large-scale trends as long as phenological and sampling processes are considered.
Whitley, B. S.; Abermann, J.; Alsos, I. G.; Biersma, E. M.; Gardman, V.; Hoye, T. T.; Jones, L.; Khelidj, N. M.; Li, Z.; Losapio, G.; Pape, T.; Raundrup, K.; Schmitz, P.; Silva, T.; Wirta, H.; Roslin, T.; Ahlstrand, N. I.; de Vere, N.
Show abstract
International efforts to digitise herbarium specimens provide the building blocks for a global digital herbarium. However, taxonomic changes and errors can result in inconsistencies when amalgamating specimen metadata, that compromise the assignment of occurrence records to correct taxa, and the subsequent interpretation of patterns in biodiversity. We present a novel workflow to mass-curate digital specimens. By employing existing digital taxonomic backbones, we aggregate specimen names by their accepted name and flag remaining cases for manual review. We then validate names using site-specific floras, balancing automation with taxonomic expert-based curation. Applying our workflow to the vascular plants of Greenland, we harmonised 175,266 digitised herbarium specimens and observations from 92 data providers from the Global Biodiversity Information Facility (GBIF). The harmonised metacollection for the Greenland flora contains 780 plant species. Our workflow increases the number of species known from Greenland compared to other currently available species checklists and increases the mean number of occurrences per species by 42.6. Our workflow illustrates the integration required in order to create a global, universally accessible digital herbarium, and shows how previous obstacles to database curation can be overcome through a combination of automation and expert curation. From the specific perspective of the Greenland flora, our approach arrives at a new checklist of taxa, a new curated metacollection of occurrence data, and revised estimates of plant richness. The list of taxa and their prevalence allow a new basis for biodiversity assessment and conservation planning. Societal Impact StatementDigitising plant collections has allowed for data to be aggregated across multiple collections, forming a single harmonised resource of unprecedented scale. This resource is only accurate once the database names are assigned to one accepted name per species. We established a semi-automated workflow for processing plant name data, leveraging taxonomic backbones and employing taxonomic expertise at key stages. Applying our workflow to the flora of Greenland, we developed a curated checklist of 780 species, capturing greater species richness than previously published, while also curating 175,266 plant records. Our findings redefine our knowledge of Greenlandic plant diversity, while harmonising a vast digital collection for further research.
Pezzi, P. H.; Latvis, M.
Show abstract
Orobanchaceae is the largest family of parasitic plants, encompassing a full spectrum of parasitic strategies, ranging from autotrophic to holoparasitic. Agalinis is a genus of facultative hemiparasites comprising about 70 species distributed throughout the Americas, including several endemic and rare taxa. Agalinis fasciculata, the beach false foxglove, is a widely distributed species across southeastern North America. Here, we use PacBio HiFi, Omni-C, and RNA-seq data to generate the first high-quality reference genome for the genus. The nuclear genome is 2.29 Gb in size, with most sequences anchored to 14 pseudochromosomes and an N50 of 162 Mb. BUSCO analyses indicate high completeness (98.4%). Structural genome annotation identified 34,133 protein-coding genes and 39,266 transcripts, most of which have at least one functional annotation. The plastid and mitochondrial genomes were also assembled. We further examined genetic diversity and demographic history in A. fasciculata, revealing low genome-wide heterozygosity and evidence of inbreeding. This reference genome is an important resource for understanding the evolutionary history of the genus and the evolutionary patterns of parasitism within Orobanchaceae. SignificanceThis high-quality genome is the first chromosome-level assembly for Agalinis, a hemiparasitic genus in the plant family Orobanchaceae. It improves the taxon sampling within Orobanchaceae, representing an important resource for investigating patterns of genome evolution in parasitic lineages. Furthermore, Agalinis has served as a focal genus for studies of the anatomy of haustorial development, and genome annotation incorporated RNA from multiple tissues, enabling the identification of genes expressed in different tissues, including roots and haustoria. This genome also serves as a reference for evolutionary studies of other Agalinis species, many of which are endemic and of conservation concern in North and South America. Overall, the beach false foxglove genome will support studies of the evolutionary history of Agalinis and genome evolution across Orobanchaceae.